Ambient sound responsive media player

ABSTRACT

Some embodiments of the present invention provide a method of adjusting an output of a media player comprising capturing an ambient audio signal; processing the ambient audio signal to determine whether one or more characteristic forms are present within the ambient audio signal; and reducing an output of a media player from a first volume to a second volume if the one or more characteristic forms are present within the ambient audio signal. The characteristic forms may be, for example, a name or personal identifier of a user of the media player, the voice of a user of the media player, or an alarm or siren.

This application is a continuation-in-part of U.S. patent applicationSer. No. 11/267,079 filed Nov. 3, 2005, which claims the benefit of U.S.Provisional Patent Application No. 60/665,291 filed Mar. 26, 2005 andU.S. Provisional Application No. 60/648,197 filed Jan. 27, 2005, all ofwhich are incorporated in their entirety herein by reference.

This application is also a continuation-in-part of U.S. patentapplication Ser. No. 11/223,368 filed Sep. 9, 2005, which claims thebenefit of U.S. Provisional Patent Application No. 60/644,417 filed Jan.15, 2005, both of which are incorporated in their entirety herein byreference.

This application is also a continuation-in-part of U.S. patentapplication Ser. No. 11/610,615 filed Dec. 14, 2006, which claims thebenefit of U.S. Provisional Patent Application No. 60/793,214 filed Apr.19, 2006, both of which are incorporated in their entirety herein byreference.

This application also claims the benefit of U.S. Provisional PatentApplication No. 60/841,990 filed Aug. 31, 2006, which is incorporated inits entirety herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to media players, and morespecifically to responsive media players.

2. Discussion of the Related Art

Portable media players have become popular personal entertainmentdevices due to their highly portable nature, their ability to provideaccessibility to a large library of stored media files, andinterconnectivity with existing computer networks, for example theInternet. The accessibility and simplicity in downloading music andother electronic media continues to fuel the popularity of these devicesas is exemplified by Apple Computer, Inc.'s highly successful iPod™portable media player. Other manufacturers have competing Media Playersoffering various functionalities and file playing compatibilities in aneffort to differentiate their products in the marketplace.

As discussed in U.S. Patent Application No. 2004/0224638 A1, which isherein incorporated by reference in its entirety, an increasing numberof consumer products are incorporating circuitry to play musical mediafiles and other electronic media. For example, many portable electronicdevices such as cellular telephones and personal digital assistants(PDAs) include the ability to play electronic musical media in many ofthe most commonly available file formats including MP3, AVI, WAV, MPG,QT, WMA, AIFF, AU, RAM, RA, MOV, MIDI, etc. With a wide variety ofdevices and file formats emerging, it is expected that in the nearfuture a large segment of the population will have upon their person anelectronic device with the ability to access music files from a libraryof media files in local memory and/or over a computer network, and playthose music files at will. Such users generally wear headphones toexperience music content in a personalized high fidelity manner.

Because most users of portable media players generally wear headphonesto play music directly into their ears, users experience the beneficialeffect of separating themselves from the noises of daily life, providinga serene audio environment of personally played music. Unfortunately,users often miss important sound events within the real world whenlistening to music through headphones of a portable media player. Forexample, another person might be talking to the media player user butbecause of the music playing through their headphones, the user isunable to hear the fact that they have been verbally addressed.Similarly, a siren or alarm may sound in the environment of aheadphone-wearing media player user, but they may not hear the warningsound effectively, thus creating a dangerous situation for the user.Finally, a headphone-wearing media player user may try to talk tosomeone else within their immediate environment, but because they cannothear their own voice, they may find themselves talking substantially tooloud for the current situation. This may create an embarrassingsituation for the user.

SUMMARY OF THE INVENTION

Several embodiments of the invention advantageously address the needsabove as well as other needs by providing a media player that isresponsive to ambient sound.

In some embodiments, the invention can be characterized as a method ofadjusting an output of a media player comprising capturing an ambientaudio signal; processing the ambient audio signal to determine whetherone or more characteristic forms are present within the ambient audiosignal; and reducing an output of a media player from a first volume toa second volume if the one or more characteristic forms are presentwithin the ambient audio signal.

In some embodiments, the invention can be characterized as a method ofadjusting an output of a media player comprising capturing an ambientaudio signal; processing the ambient audio signal to determine whetherone or more characteristic forms are present within the ambient audiosignal; and mixing at least a portion of the ambient audio signal with afirst output of a media player to generate a second output of the mediaplayer if the one or more characteristic forms are present within theambient audio signal.

In some embodiments, the invention can be characterized as a Anapparatus for use in a media player comprising a microphone; and one ormore processors adapted to: process an ambient audio signal received bythe microphone to determine whether one or more characteristic forms arepresent within the ambient audio signal, and adjust an output of a mediaplayer if the one or more characteristic forms are present within theambient audio signal.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features and advantages of severalembodiments of the present invention will be more apparent from thefollowing more particular description thereof, presented in conjunctionwith the following drawings.

FIG. 1 depicts a generalized block diagram of a media player inaccordance with some embodiments of the present invention;

FIG. 2 depicts a flow chart of a process of an ambient sound responsivemedia player unit in accordance with some embodiments of the presentinvention.

Corresponding reference characters indicate corresponding componentsthroughout the several views of the drawings. Skilled artisans willappreciate that elements in the figures are illustrated for simplicityand clarity and have not necessarily been drawn to scale. For example,the dimensions of some of the elements in the figures may be exaggeratedrelative to other elements to help to improve understanding of variousembodiments of the present invention. Also, common but well-understoodelements that are useful or necessary in a commercially feasibleembodiment are often not depicted in order to facilitate a lessobstructed view of these various embodiments of the present invention.

DETAILED DESCRIPTION

The following description is not to be taken in a limiting sense, but ismade merely for the purpose of describing the general principles ofexemplary embodiments. The scope of the invention should be determinedwith reference to the claims.

There currently exists a need to provide intelligent volume control ofmedia content play through headphones (or other similar headsets and earpieces), such that a headphone wearing media player user may more easilyhear when he or she is verbally addressed, when an alarm or siren soundswithin his or her environment, and/or when he or she is speaking aloud.

This disclosure addresses the deficiencies of the relevant art andprovides exemplary systematic, methodic and computer program productembodiments which provides an ambient sound responsive portable mediaplayer that enables a media player to intelligently adjust and/or varythe playing volume of a musical media file to a user based at least inpart upon detected sounds from the ambient environment of the user. Morespecifically, the present invention provides an ambient sound responsivemedia player in which the musical sounds played to a user through theheadphones of a media player are moderated based at least in part upondetected ambient sounds from within the user's local environment. Thesystem works by incorporating a microphone in the media player system,the microphone configured to detect sounds from the ambient environmentof the media player user as the user listens to music throughheadphones. The system further includes a processor for making volumeadjustments to playing media content based at least in part upondetected ambient audio signals from said microphone. The processor ofthe present invention may be configured through hardware and softwarecomponents to perform one or more of the following functions:

(A) Name responsive volume reduction. This is a function in which theplaying volume of currently playing media file is automatically reducedby the processor for a period of time in response to the media playersuser's name being detected as verbal content within the audio signalcaptured from the ambient environment. In this way if another personcalls the user's name, presumably to talk to that user, the media playeris responsive to automatically reduce the playing volume of mediacontent to that user.

(B) User voice responsive volume reduction. This is a function in whichthe playing volume of a currently playing media file is automaticallyreduced by the processor for a period of time in response to the mediaplayers user's own voice being detected within the audio signal capturedfrom the ambient environment. In this way if the media player userbegins speaking aloud into the ambient environment, the media player isautomatically responsive by reducing the playing volume of media contentto that user so the user can more easily hear himself talk. Thisprevents the user from speaking too loudly into the ambient environmentand embarrassing himself.

(C) Alarm sound volume reduction. This is a function in which theplaying volume of currently playing media file is automatically reducedby the processor for a period of time in response to an alarm sound orsiren sound being detected as within the audio signal captured from theambient environment. In this way if an alarm or siren sounds within theuser's local environment, presumably because there is a danger to bealerted to, the media player is responsive to automatically reduce theplaying volume of media content to that user. In this way the user willmore easily hear the alarm sound.

In some embodiments of the present invention, the media player isoperative to mix musical audio content derived from a stored media filewith ambient audio content captured from a microphone local to the user.In this way the user can listen to musical media content in audiocombination with ambient audio signals from the local environment. Whilesuch a function may enable a user to more easily hear sounds such asother speaking users, the user's own voice, and/or alarms and sirens,such a mixed audio signal may be unpleasant during times when suchevents are not occurring. Thus some embodiments of the present inventioninclude an inventive method in which the relative volume balance of themixed signal (i.e. the relative volume of the musical media content andthe ambient microphone content) are selectively adjusted in response todetected ambient audio events. More specifically, the relative volume ofthe microphone content is automatically increased with respect to themusical media content within the mixed audio signal in response todetected ambient audio events such as (A) detection of the mediaplayer's name being uttered within the ambient audio signal, (B)detection of the media player's own voice within the ambient audiosignal, and/or (C) detection of an alarm or siren sound present withinthe ambient audio signal.

The present invention provides a system, method and computer programproduct which enables a media player to intelligently adjust and/or varythe playing volume of a musical media file to a user based at least inpart upon detected sounds from the ambient environment of the user. Morespecifically, the present invention provides an ambient sound responsivemedia player in which the musical sounds played to a user through theheadphones of a media player are moderated based at least in part upondetected ambient sounds from within the user's local environment. Insome embodiments ambient sounds from the local environment areselectively mixed with digital media sounds such that their relativevolumes are adjusted based at least in part upon detected ambient soundevents within the user's local environment. Where necessary, computerprograms, routines and algorithms are envisioned to be programmed in ahigh level language, for example Java™ C++, C, C#, or Visual Basic™.

Referring to FIG. 1, a generalized block diagram of a media player 100is depicted. The media player 100 includes a communicationsinfrastructure 90 used to transfer data, memory addresses where dataitems are to be found and control signals among the various componentsand subsystems of the media player 100.

A central processor 5 is provided to interpret and execute logicalinstructions stored in the main memory 10. The main memory 10 is theprimary general purpose storage area for instructions and data to beprocessed by the central processor 5. The main memory 10 is used in itsbroadest sense and includes RAM, EEPROM and ROM. A timing circuit 15 isprovided to coordinate activities within the media player 100. Thecentral processor 5, main memory 10 and timing circuit 15 are directlycoupled to the communications infrastructure 90.

A display interface 20 is provided to drive a display 25 associated withthe media player 100. The display interface 20 is electrically coupledto the communications infrastructure 90 and provides signals to thedisplay 25 for visually outputting both graphics and alphanumericcharacters. The display interface 20 may include a dedicated graphicsprocessor and memory to support the displaying of graphics intensivemedia. The display 25 may be of any type (e.g., cathode ray tube, gasplasma) but in most circumstances will usually be a solid state devicesuch as liquid crystal display.

A secondary memory subsystem 30 is provided which houses retrievablestorage units such as a hard disk drive 35, a removable storage drive40, an optional a logical media storage drive 45 and an optional removalstorage unit 50.

The removable storage drive 40 may be a replaceable hard drive, opticalmedia storage drive or a solid state flash RAM device. The logical mediastorage drive 45 may be flash RAM device, EEPROM encoded with playablemedia, or optical storage media (CD, DVD). The removable storage unit 50may be logical, optical or of an electromechanical (hard disk) design.

A communications interface 55 subsystem is provided which allows forstandardized electrical connection of peripheral devices to thecommunications infrastructure 90 including, serial, parallel, USB, andFirewire connectivity. For example, a user interface 60 and atransceiver 65 are electrically coupled to the communicationsinfrastructure 90 via the communications interface 55. For purposes ofthis disclosure, the term user interface 60 includes the hardware andoperating software by which a user executes procedures on the mediaplayer 100 and the means by which the media player conveys informationto the user.

The user interface 60 employed on the media play 100 includes a pointingdevice (not shown) such as a mouse, thumbwheel or track ball, anoptional touch screen (not shown); one or more pushbuttons (not shown);one or more sliding or circular rheostat controls (not shown), one ormore switches (not shown), and one or more tactile feedback units (notshown); One skilled in the relevant art will appreciate that the userinterface devices which are not shown are well known and understood.

To accommodate non-standardized communications interfaces (i.e.,proprietary), an optional separate auxiliary interface 70 and auxiliaryI/O port 75 are provided to couple proprietary peripheral devices to thecommunications infrastructure 90.

The transceiver 65 facilitates the remote exchange of data andsynchronizing signals between and among the various media players 100A,100B, 100C in processing communications with 85 with this media player100.

The transceiver 65 is envisioned to be of a radio frequency typenormally associated with computer networks for example, wirelesscomputer networks based on BlueTooth™ or the various IEEE standards802.11.sub.x., where x denotes the various present and evolving wirelesscomputing standards.

Alternately, digital cellular communications formats compatible with forexample GSM, 3G and evolving cellular communications standards. Bothpeer-to-peer (PPP) and client-server models are envisioned forimplementation of the invention. In a third alternative embodiment, thetransceiver 65 may include hybrids of computer communications standards,cellular standards and evolving satellite radio standards.

Lastly, an audio subsystem 95 is provided and electrically coupled tothe communications infrastructure 90. The audio subsystem is configuredfor the playback and recording of digital media, for example, multi ormultimedia encoded in any of the exemplary formats MP3, AVI, WAV, MPG,QT, WMA, AIFF, AU, RAM, RA, MOV, MIDI, etc.

The audio subsystem includes a microphone 95A which is used for thedetection of sound signals from the user's local ambient environment.The microphone 95A may be incorporated within the casing of the portablemedia player or may be remotely located elsewhere upon the body of theuser and is connected to the media player by a wired or wireless link.Ambient sound signals from microphone 95A are generally captured asanalog audio signals and converted to digital form by an analog todigital converter or other similar component and/or process. A digitalsignal is thereby provided to the processor of the media player, thedigital signal representing the ambient audio content captured bymicrophone 95A. In some embodiments the microphone 95A is local to theheadphones or other head-worn component of the user. In some embodimentsthe microphone is interfaced to the media player by a Bluetoothcommunication link. In some embodiments the microphone comprises aplurality of microphone elements.

The audio subsystem also includes headphones (or other similarpersonalized audio presentation units that display audio content to theears of a user) 95B. The headphones may be connected by wired orwireless connections. In some embodiments the headphones are interfacedto the media player by a Bluetooth communication link.

As referred to in this specification, “media items” refers to video,audio, streaming and any combination thereof. In addition, the audiosubsystem is envisioned to optionally include features such as graphicequalization, volume, balance, fading, base and treble controls,surround sound emulation, and noise reduction. One skilled in therelevant art will appreciate that the above cited list of file formatsis not intended to be all inclusive.

The media player 100 includes an operating system, the necessaryhardware and software drivers necessary to fully utilize the devicescoupled to the communications infrastructure 90, media playback andrecording applications and at least one ambient sound responsive volumeadjustment program operatively loaded into main memory 10. Optionally,the media player 100 is envisioned to include at least one remoteauthentication application, one or more cryptography applicationscapable of performing symmetric and asymmetric cryptographic functions,and secure messaging software. Optionally, the media player 100 may bedisposed in a portable form factor to be carried by a user.

Referring to FIG. 2, shown is a flow chart of a process of an ambientsound responsive media player unit in accordance with some embodimentsof the present invention. The program flow shown would generally beperformed in parallel with other processes performed by the mediaplayer, including processes that select and/or play media items byaccessing media content from memory and outputting an audiorepresentation of such media content through headphones and/or othersimilar audio presentation hardware. The program flow shown wouldgenerally be performed, at least in part, by routines running upon aprocessor of the portable media player. The program flow shown isgenerally performed, at least in part, by at least a portion of at leastone ambient sound responsive volume adjustment program operativelyloaded into main memory 10. In the particular embodiment shown herein,the entire program flow shown is performed by the at least one ambientsound responsive volume adjustment program operatively loaded into mainmemory 10. At the time in which the program flow begins, the mediaplayer has already selected and begun to play a media file through aseparate process (not shown).

The program flow of FIG. 2 begins at step 200, generally in response toa function call or other programming flow construct. Once started, theprogram flow performs a continuous loop until terminated. The continuousloop includes a number of steps which may be performed in a variety oforders. In the particular flow shown in FIG. 2, the first step in thecontinuous loop is step 201 wherein ambient audio signals are capturedthrough microphone 95A. This ambient audio signals are generallycaptured as analog signals from the microphone element and then aredigitized through an analog to digital conversion process. In addition,noise reduction, filtering, and/or other commonly known signalprocessing steps may be performed upon the ambient signal. The ambientaudio signals, once converted to a final digital form, are generallystored in a temporary local memory of the portable media player. Itshould be noted that this ambient audio signal capture step 201 may beperformed by a separate process that runs in parallel with the programflow of FIG. 2. This separate process may, for example, store digitizedambient audio signal into a shared memory space that is accessible bythe steps of this program flow.

The process then proceeds to step 202 wherein additional signalprocessing is performed on the captured ambient signal. This signalprocessing may include sound recognition processing, speech recognitionprocessing, and/or vocal identity recognition processing steps and/orsub-steps. Because sound recognition, speech recognition, and/or vocalidentity recognition processes are known to the prior art the specificsof such processes will not be described in detail herein. For example,U.S. Pat. No. 4,054,749 and U.S. Pat. No. 6,298,323, each of which arehereby incorporated by reference, both disclose methods and apparatusfor voice identity recognition wherein a particular user's voice may beidentified as being present within an audio signal within certainaccuracy limits. Similarly, U.S. Pat. No. 6,804,643, which is herebyincorporated by reference, discloses a speech recognition system inwhich particular verbal utterances may be identified from within anaudio signal, the particular verbal utterances including particularwords, phrases, names, and other verbal constructs. Similarly, otherpieces of art disclose methods and systems by which particularnon-verbal sounds may be identified within an ambient sound signal. Oneexample of such sound recognition methods is disclosed in HABITATTELEMONITORING SYSTEM BASED ON THE SOUND SURVEILLANCE by Castelli,Vacher, Istrate, Besacier, and Sérignat which is hereby incorporated byreference. Another example of such sound recognition methods isdisclosed in a 1999 doctoral dissertation from MIT by Keith Dana Martinentitled Sound-Source Recognition: A Theory and Computational Modelwhich is hereby incorporated by reference. Another example of such soundrecognition methods is disclosed by Michael Casey in the IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO.6, JUNE 2001 in a paper entitled, MPEG-7 Sound-Recognition Tools whichis hereby incorporated by reference. In such papers it is explained thatrecent advances in pattern recognition methodologies make the automaticidentification of characteristic environmental sounds, animal sounds,non-verbal human utterances, and other non-verbal environmental soundspossible. Using such techniques, for example, alarm sound and/or sirensounds may be identified from within an ambient audio signal.

Thus by using the prior art methods of speech recognition, voiceidentity recognition, and environmental sound identification, theambient sound signal captured by microphone 95A and stored in localmemory, may be processed such that (A) the utterance of the media playeruser's name may be identified if substantially present within thecaptured ambient audio signal, (B) the unique voice of the media playeruser may be identified if substantially present within the capturedambient audio signal, and/or (C) the sound of an alarm and/or sirenand/or other similar emergency related alert sound may be identified ifsubstantially present within the captured ambient audio signal. Toperform such identifications, processing is performed in step 202.Note—in general this step is performed upon a certain time-sample'sworth of ambient audio signal during each loop of the program flow. Alsoin general the time-samples generally proceed as overlapping timewindows with each loop of the program flow.

The process then proceeds to step 203 wherein a set of conditionalroutines are performed based upon whether or not a characteristic form(e.g. a signal conforming to A, B, or C above) is identified as presentwithin the ambient signal. A characteristic form is a sound or signalthat when detected by the media player will cause an audible adjustmentto the output of the media player such that the user will be enabled tobetter hear ambient sounds. Thus in step 203, conditional routines areperformed based upon whether or not the ambient signal has beenidentified to contain one or more of (A) an verbal utterance of themedia player user's name by another user, (B) a verbal utterance of anykind from the media player user himself or herself, or (C) thenon-verbal sound of an alarm and/or siren and/or other similar emergencyrelated alert. If one or more of such characteristic forms are presentwithin the ambient audio signal, the process proceeds along arrow 204 tostep 206. If not, the process proceeds along arrow 205 to step 207.These two alternate paths are described as follows:

In the “yes” branch, the process proceeds along arrow 204 to step 206.At step 206, the routines of the present invention perform anIntelligent Automatic Volume Reduction routine in which the currentlyplaying media audio signal is automatically reduced in volume so thatthe user can better hear the ambient sounds around him or her. Thisreduction in playing volume of the currently playing media audio signalmay be performed abruptly. Alternately, the volume reduction may beperformed gradually over a period of time. In general the period of timeis short, for example 1500 milliseconds. The volume reduction may bereduced by a fixed amount, for example to 65% of the nominal volumelevel set by the user, or may be reduced by an amount that is dependentupon the volume level of the identified characteristic ambient soundthat triggered the reduction. In some embodiments the user may set aconfiguration parameter that indicates the desired volume reductionlevel upon the identification of a characteristic ambient sound event.The volume reduction level may be set as a percentage of the nominalvolume level at which the user is currently listening. Alternately thevolume reduction level may be set to a defined low value on the absolutevolume scale of the unit (for example to a value of 2 out of a scale of10). Once this automatic volume reduction step is complete, the processflows to step 208 which will be described further down.

In the “no” branch, the process proceeds along arrow 205 to step 207. Atstep 207, the routines of the present invention will resume the playingvolume of the currently playing media content to (or approximately to)the normal (nominal) playing volume. By nominal playing volume, it meansthe volume it would be playing as if it had not been reduced previouslyby the Intelligent Automatic Volume Reduction routines. Thus if thevolume had been reduced previously by the Intelligent Automatic VolumeReduction routines of step 206, then step 207 will return the volumesubstantially to its normal volume level. This may happen abruptly.Alternately the return of the volume to the nominal level may beperformed using gradual volume adjustment routine that gradually resumesthe volume over a period of time. In some embodiments the period of timeis on the order of 1500 to 3000 milliseconds. Such a time period isshort enough that the event seems quick to the user, but long enoughthat it is not jarring. Note, if the volume was already at the nominallevel when step 207 is performed, then step 207 does not perform anysubstantial change in volume level. Once step 207 completes, the processloops back to the beginning, returning to step 201. In this way theroutine continues to capture and process a steady stream of ambientaudio signals and responds accordingly with volume reduction and/orresumption.

If a characteristic form was identified within the ambient signal instep 203 and the playing volume of the media content was reduced at step206, the process then proceeds to step 208 wherein a time delay may beoptionally performed. The time delay is performed to ensure that thevolume reduction lasts for at least some amount of time beyond theidentification of the characteristic form within the ambient signal. Ingeneral, this amount of time may be set by the user through aconfiguration process. This amount of time may be, for example, 3 to 6seconds. In this way if the routines of the present invention, forexample, identify that somebody called the name of the media playeruser, the volume reduction does not just occur for a split second uponthe identification, but lasts for a number of seconds thereafter. Inthis way the user may hear what is being said to him immediately afterhis or her name was called. In some embodiments the volume reductionlasts indefinitely, or until the user explicitly resumes normal volumeby pressing a button or otherwise engaging the user interface upon hisor her media player. The process then loops back to step 201. In thisway the routine of the present invention are configured to continuallycapture and process a steady stream of ambient audio signals andresponds accordingly with volume reduction and/or resumption. In generalthe volume reductions linger for some time delay period after eachidentified characteristic form within the ambient signal. In someembodiments the duration of the time delay is dependent upon the type ofcharacteristic form identified. For example, if the characteristic formis an alarm sound, the time delay may not last long beyond the cessationof the alarm sound, presumably because the emergency alert is over.Alternately, if the characteristic form is a vocal call of the user'sname by another user, the time delay is set generally long enough toallow the user to hear what else the other user says after the namecall.

In a unique embodiment, the time delay is set to last for as long as theuser who called the media player user's name continues to speak. This isperformed based upon the detected vocal identity of this other user.Thus if a first user calls the name of the media player user and thencontinues to speak, the routines of the present invention may beconfigured to perform an automatic volume reduction upon the detectionof the name call as uttered by the first user and will maintain thevolume reduction for at least as long as the first user's voicecontinues to be identified without a time-gap of more than somethreshold amount of time. The threshold is generally set such that ifthe first user speaks at a typical speaking pace, the volume reductionwill be maintained until the first user finishes talking.

Additional Non-Verbal Ambient Sound Triggers: As described previously,the routines of the present invention may be configured to trigger theautomatic volume reduction of playing media content on a media player inresponse to the detection of a characteristic non-verbal sound withinthe local environment such as the sound of an alarm and/or siren and/orother similar emergency alert captured by microphone 95A of the system.In some embodiments of the present invention, the automatic volumereduction routines may be configured such that additional and/oralternate characteristic non-verbal sounds within the ambientenvironment may be detected and trigger the volume reduction. Forexample, common household sounds that a user may desire to attend tosuch as the sound of a doorbell ringing, a telephone ringing, or a babycrying may be employed as characteristic ambient sounds that trigger theautomatic volume reduction routines and methods disclosed herein. Inthis way a user may be wearing a media player within his or her houseand if the microphone on the media player captures a characteristicsound that is substantially similar to a doorbell ringing, a phoneringing, or a baby crying, the volume of the playing media content isautomatically reduced for a period of time following the detectedcharacteristic ambient sound event.

System Configuration: For embodiments of the present invention thattrigger a volume reduction period based upon the detection of anutterance of the media player user's name within the ambientenvironment, the system is generally configured to identify one or moreproper nouns that are relationally associated with the user and storedin memory as a digitized sample, an audio template, or some other storedrepresentation that may be used for pattern matching or other speechrecognition methods. For example, the user's name was Theodore, he mayconfigure his media player to be responsive to utterances that aresubstantially similar to the verbal utterance “Theodore” or the verbalutterance “Theo” or the verbal utterance “Teddy” or the verbal utterance“Ted”. In this way a single user may configure his or her media playerto be volume-reduction responsive to verbal utterances of a plurality ofproper nouns, i.e. personal identifiers, that are set in memory to berelationally associated with an automatic volume reduction process ofthe media player. The user may also configure the unit to be responsiveto a first name, last name, and middle name, and/or any combinationthereof. The user may also configure the unit to be responsive only toname utterances that exceed a certain volume threshold. In this way theunit may be less likely to get falsely triggered by name calls that maynot be meant for the user even if they conform with a characteristicutterance associated with that user. In addition, the user may set hisor her unit to be responsive to utterances that are nick-names orpen-names or user-names or even other words that are not necessarilynames. This Theodore in the example above may set his unit to beresponsive to the utterance “dog-boy”. So long as his friends know touse the utterance “dog-boy” to get his attention, the configuration willwork well for this user. In this way a user may set a particular word orphrase to be effectively a volume reduction password that his or herfriends can use to get his or her attention. In general, setting aparticular verbal utterance to be an identified volume reduction triggerutterance within the ambient environment, involves the user uttering theword or phrase to the media player during a configuration process.Alternate methods of configuring speech recognition systems known to theart may be used as well. In addition, one or more generic words commonlyused to summon attention, such as, for example, “sir” or “help” or“excuse me,” may be additionally optionally configured to also triggerthe automated volume reduction methods if such words are captured in theambient audio signal at a volume that exceeds a certain threshold.

Audio Mixing Embodiments: In some embodiments of the present invention,the media player is operative to mix musical audio content derived froma stored media file with ambient audio content captured from amicrophone local to the user. The methods and apparatus used to mix twoseparate audio signals into a single audio stream that may be listenedto by a user is well known in the art and will not be described indetail herein. Regardless of the method used, a single audio signal ispresented to the user through the headphones or other similar sounddisplay hardware, the signal audio signal including an audio combinationof a musical media file accessed from a memory of the media player andan ambient audio signal derived from the signal captured by Microphone95A. The relative volume of the two component audio signals asrepresented in the combined mix audio signal may be dependent at leastin part upon a mixing balance setting supplied by the user through auser interface of the media player. In this way the user can listen tomusical media content in audio combination with ambient audio signalsfrom the local environment. It should be noted that the ambient audiosignal content may be filtered or otherwise processed to extractextraneous noise and/or sound content that is outside certain magnitudeand/or frequency limits or thresholds.

While such an inventive audio mixing function may enable a user to moreeasily hear sounds from within his or her natural surroundings in acontrolled and settable audio combination with music that he or she islistening to (including ambient sounds such as other speaking users, theuser's own voice, and/or alarms and sirens), such a mixed audio signalmay be unpleasant during times when such events are not occurring. Forexample, the user may be constantly distracted by ambient environmentsounds in the mixed audio signal that are not important, relevant, orvaluable for him or her to attend to. Thus some embodiments of thepresent invention include a further inventive method in which therelative volume balance of the mixed signal (i.e. the relative volume ofthe musical media content and the ambient microphone content) areselectively adjusted in response to detected ambient audio events. Morespecifically, the relative volume of the microphone content isautomatically increased with respect to the musical media content, for aperiod of time, in response to detected characteristic ambient audioevents within the ambient audio signal stream. The detectedcharacteristic ambient audio events may include, but are not limited to:(A) the detection of the media player's name being uttered within theambient audio signal, (B) the detection of the media player's own voicewithin the ambient audio signal, and/or (C) detection of an alarm orsiren sound present within the ambient audio signal.

In this way, a user may be listening to an audio signal that is a mixedaudio combination of a musical media file and an ambient microphonesignal, the relative volumes being such that the musical media file issubstantially louder than the ambient microphone signal as presentedwithin the mixed audio content. In response to the detection of acharacteristic ambient audio event such as A, B, or C, above, theroutines of the present invention are configured to adjust the relativevolumes in the mixed audio signal for a period of time, the adjustmentsuch that the representation of the ambient audio signal is madesubstantially louder relative to the musical media content. Thus if athird party calls the name of the user of the media player, upondetection of that name being uttered, the user is presented with anaudio mix of musical media and microphone data such that the user caneasily hear the ambient environment as mixed with the musical media.When the period of time is over the relative volume levels areautomatically returned to their nominal relative volume (i.e. a nominalrelative volume such that the musical media content is substantiallylouder than the microphone content).

It should also be noted that in some embodiments the nominal relativevolume levels of the two signals may be set such that the volume of theambient microphone content is substantially zero at times when anambient audio event has not been detected. In this way the user onlyhears the musical content until and unless an ambient audio event isdetected. In response to such a detected ambient audio event (forexample an event such as A, B, or C above), the automatic routines ofthe present invention adjust the relative volumes of the two signalssuch that the ambient environment microphone signal is no longer zero,instead being substantial with respect to the musical media content. Inthis way the pure musical content is played to the user until an ambientaudio event is detected, then in response to the detected event a mixedaudio signal is presented with both musical content and ambient audiocontent such that ambient audio content is clearly audible at asubstantial relative volume. This change in mix volumes may be abruptlyenacted or gradually enacted. This mixed audio signal with new volumerelative volume levels lasts for a period of time. Then after the periodof time the routines of the present invention automatically resume theaudio to the nominal volume levels (in this case the ambient audiocontent going to zero volume). The resumption of nominal values may beabrupt or gradual.

Note—in some embodiments the mixed volume level is such that the musicalaudio content is gradually decreased down to substantially zero whilethe ambient audio musical content is gradually increased up to the priormusic volume level. Such a cross-fade enables the music to fade outwhile the ambient audio content fades in. This lasts for a period oftime. After the period of time, the process reverses, the ambient audiocontent fading out to zero volume and the musical content fading back toits pre-event nominal volume.

User Interface: In some embodiments of the present invention the mediaplayer includes dedicated user interface elements such as buttons, touchscreen elements, and/or other manual or vocal commands that enable auser to override the automatic volume adjustment methods disclosedherein. For example, a button may be provided upon the portable mediaplayer that causes the volume levels to return to nominal values upon itbeing pressed. In this way the automatic ambient sound responsive volumeadjustment routines of the present invention may cause the musical mediacontent to automatically drop in volume during an event, such as a userspeaking or an alarm sounding, and the user may override the automaticvolume reduction by pressing the dedicated button or engaging the otherdedicated user interface element. In this way the user can quicklyresume the volume back to nominal levels, if for example, the userrealizes that the alarm is not relevant to him and/or the other detectedambient audio event is not important.

External Electronic Alert Signal Employed for Automatic Volume ReductionIn some embodiments of the present invention, the automatic volumereduction routines of the present invention that are active to attenuatethe volume of a playing media file to a user for a period of time andthen resume volume to nominal levels thereafter, may be triggered by anexternal electronic signal alert detected by a wireless transceiver ofthe media player. In this way an external electronic device in theuser's local environment, such as a home automation system, a homesecurity system, a personal computer, or some other separate electronicdevice, can send a specific electronic alert signal to the portablemedia player of the user. In response to receiving the specificelectronic alert signal from the separate device within the user's localenvironment, the media player may automatically reduce the playingvolume of the media content to the user for a period of time. Thisfeature is useful in a ubiquitous computing environment in which aplurality of intelligent devices may coexist within a local environmentof the user as he or she listens to music through the portable mediaplayer. A separate device, such as a home security system, may wish togain the user's attention and thus can issue an electronic alert to themedia player which causes the volume to be reduced for a period of time.In some embodiments, the electronic alert signal system is used incombination with the features of the ambient sound responsive mediaplayer disclosed herein.

Thus as disclosed on the pages herein, an ambient sound responsive mediaplayer is operative to alert a media player user to ambient audio eventswithin his or her local environment that he or she may not be able toeasily hear while listening to the currently playing media content.Furthermore the present invention enables the user to attend to theambient audio event for a period of time following the detected ambientaudio event by lowering the music volume during that period of time. Thepresent invention may support one or more of a variety of ambient audioevents, including the verbal call of the user's name by another party inthe local environment, an siren or alarm or other emergency soundaudible within the local environment, the utterance of a password phraseby another party within the local environment, a verbal utteranceidentified to be from a user with a particular verbal identity withinthe local environment, and/or a verbal utterance identified to be fromthe media player user himself. In these ways the present invention isoperative to enable a user to listen to music without cutting himselfoff from important audio events within his or her local environment. Inthese ways some embodiments of the present invention also are operativeto allow third party users to gain the verbal attention of a mediaplayer user who may be listening to loud music through headphones. Inthese ways some embodiments of the present invention are also operativeto enable a media player user to hear emergency sounds that may beimportant within his or her local environment. And finally, in theseways some embodiments of the present invention are also operative toenable a media player user to spontaneously begin engage in aconversation and not talk too loud, because he or she can more easilyhear himself or herself while talking.

The foregoing described embodiments of the invention are provided asillustrations and descriptions. They are not intended to limit theinvention to the precise forms described. In particular, it iscontemplated that functional implementation of the invention describedherein may be implemented equivalently in hardware, software, firmware,and/or other available functional components or building blocks. Whilethe invention herein disclosed has been described by means of specificembodiments, examples and applications thereof, numerous modificationsand variations could be made thereto by those skilled in the art withoutdeparting from the scope of the invention set forth in the claims.

1. A method of adjusting an output of a media player comprising:capturing an ambient audio signal; processing the ambient audio signalto determine whether one or more characteristic forms are present withinthe ambient audio signal; and reducing an output of a media player froma first volume to a second volume if the one or more characteristicforms are present within the ambient audio signal.
 2. The method ofclaim 1 wherein the one or more characteristic forms comprise a name orpersonal identifier of a user of the media player.
 3. The method ofclaim 1 wherein the one or more characteristic forms comprise the voiceof a user of the media player.
 4. The method of claim 1 wherein the oneor more characteristic forms comprise an alarm or siren.
 5. The methodof claim 1 wherein the one or more characteristic forms comprise one ormore generic words commonly used to summon attention.
 6. The method ofclaim 1 wherein the one or more characteristic forms comprise a commonhousehold sound selected from the group consisting of a doorbell ring, atelephone ring and a baby's cry.
 7. The method of claim 1 wherein theone or more characteristic forms comprise the voice of a person otherthan a user of the media player.
 8. The method of claim 1 wherein thereducing step further comprises reducing the output of the media playerif a volume of the one or more characteristic forms exceeds a volumethreshold.
 9. The method of claim 1 wherein the reducing step furthercomprises reducing the output of the media player in a manner that isperformed gradually over a period of time.
 10. The method of claim 1wherein the second volume is a fixed percentage of the first volume. 11.The method of claim 1 wherein the second volume is based at least inpart upon a volume level of the one or more characteristic forms. 12.The method of claim 1 wherein the second volume is a fixed volume on anabsolute volume scale of the media player.
 13. The method of claim 1further comprising: resuming the output of the media player to the firstvolume in a manner that is performed gradually over a period of time.14. The method of claim 1 wherein the media player maintains the outputat the second volume for a fixed duration.
 15. The method of claim 1wherein the media player maintains the output at the second volume untilthe media player is manually reset to the first volume.
 16. The methodof claim 1 wherein the media player maintains the output at the secondvolume for a duration dependent upon the one or more characteristicforms.
 17. The method of claim 1 wherein the media player is manuallyreset to the first volume by actuating a button on the media player. 18.The method of claim 1 wherein the reducing step further comprisesreducing the output of the media player upon receiving an electronicalert signal.
 19. A method of adjusting an output of a media playercomprising: capturing an ambient audio signal; processing the ambientaudio signal to determine whether one or more characteristic forms arepresent within the ambient audio signal; and mixing at least a portionof the ambient audio signal with a first output of a media player togenerate a second output of the media player if the one or morecharacteristic forms are present within the ambient audio signal. 20.The method of claim 19 wherein the one or more characteristic formscomprise a name or personal identifier of a user of the media player.21. The method of claim 19 wherein the one or more characteristic formscomprise the voice of a user of the media player.
 22. The method ofclaim 19 wherein the one or more characteristic forms comprise an alarmor siren.
 23. The method of claim 19 wherein the one or morecharacteristic forms comprise one or more generic words commonly used tosummon attention.
 24. The method of claim 19 wherein the one or morecharacteristic forms comprise a common household sound selected from thegroup consisting of a doorbell ring, a telephone ring and a baby's cry.25. The method of claim 19 wherein the one or more characteristic formscomprise the voice of a person other than a user of the media player.26. The method of claim 19 wherein the mixing step further comprisesmixing the at least a portion of the ambient audio signal with the firstoutput of the media player to generate the second output of the mediaplayer if a volume of the one or more characteristic forms exceeds avolume threshold.
 27. The method of claim 19 wherein the mixing stepfurther comprises mixing the at least a portion of the ambient audiosignal with the first output of the media player to generate the secondoutput of the media player in a manner that is performed gradually overa period of time.
 28. The method of claim 19 wherein a first volume ofthe at least a portion of the ambient audio signal is graduallyincreased and a second volume of the first output of the media player isgradually decreased.
 29. The method of claim 19 wherein a first volumeof the at least a portion of the ambient audio signal is substantialrelative to a second volume of the first output of the media player,such that the at least a portion of the ambient audio signal is clearlyaudible.
 30. The method of claim 19 further comprising: resuming thefirst output of the media player in a manner that is performed graduallyover a period of time.
 31. The method of claim 19 wherein the mediaplayer maintains the second output for a fixed duration following thedetermination of the one or more characteristic forms.
 32. The method ofclaim 19 wherein the media player maintains the second output until themedia player is manually reset to the first output.
 33. The method ofclaim 19 wherein the media player maintains the second output dependentupon a duration of the detected one or more characteristic forms. 34.The method of claim 19 wherein the media player is manually reset to thefirst output by actuating a button on the media player.
 35. The methodof claim 19 wherein the mixing step further comprises mixing the atleast a portion of the ambient audio signal with the first output of themedia player to generate the second output of the media player uponreceiving an electronic alert signal over a wireless link.
 36. Anapparatus for use in a media player comprising: a microphone; and one ormore processors adapted to: process an ambient audio signal received bythe microphone to determine whether one or more characteristic forms arepresent within the ambient audio signal, and adjust an output of a mediaplayer if the one or more characteristic forms are present within theambient audio signal.
 37. The apparatus of claim 36 wherein the one ormore processors are adapted to reduce the output of the media playerfrom a first volume to a second volume.
 38. The apparatus of claim 37wherein the one or more characteristic forms are selected from the groupconsisting of a name or personal identifier of a user of the mediaplayer, the voice of a user of the media player, and an alarm or siren.39. The apparatus of claim 36 wherein the one or more processors areadapted to mix at least a portion of the ambient audio signal with afirst output of the media player to generate a second output of themedia player.
 40. The apparatus of claim 39 wherein the one or morecharacteristic forms are selected from the group consisting of a name orpersonal identifier of a user of the media player, the voice of a userof the media player, and an alarm or siren.